Double-ended prediction of the naturalness ratings of the blizzard challenge 2008-2013
نویسندگان
چکیده
In this paper we describe a double-ended (i.e. reference-based or intrusive) approach to objective quality estimation of synthetic speech that uses a linear regression model whose parameters can easily be interpreted. The model was trained and evaluated on English data from the 2008 to 2013 Blizzard Challenges (BC) [1], which is the largest publically available resource of listener-evaluated synthetic speech. To our knowledge, this is the first attempt to train and evaluate a speech quality predictor on the whole data set. Predicting the naturalness of the different participating systems in the BC is not an easy task because some of the systems are quite close in quality. Our best results correspond to a Pearson correlation coefficient of 0.60 and 0.84 for sentences and systems, respectively, using a leave-one-systemout evaluation, which by far outperformed the ITU-T standard PESQ [2] for double-ended speech quality evaluation on this data.
منابع مشابه
Improving Instrumental Quality Prediction Performance for the Blizzard Challenge
In this paper, the performance of the standard instrumental quality prediction algorithm ITU-T P.563 is reported based on the 2007 and 2008 Blizzard Challenge speech data. The algorithm, which is optimized for natural speech, is shown to obtain poor correlation with subjective quality ratings. In an attempt to improve instrumental quality prediction performance for the Blizzard Challenge, modif...
متن کاملThe Blizzard Challenge 2007
In Blizzard 2007, the third Blizzard Challenge, participants were asked to build voices from a dataset, a defined subset and, following certain constraints, a subset of their choice. A set of test sentences was then released to be synthesised. An online evaluation of the submitted synthesised sentences focused on naturalness and intelligibility, and added new sections for degree of similarity t...
متن کاملThe USTC System for Blizzard Challenge 2008
This paper introduces the speech synthesis system developed by USTC for Blizzard Challenge 2008. Two synthetic voices from the released UK English database are built using the HMMbased unit selection synthesis method, which is a hybrid of statistical parametric synthesis and unit-selection techniques. In this method, the optimal sequence of phone-sized candidate units is selected from the datab...
متن کاملThe USTC System for Blizzard Challenge 2009
This paper introduces the USTC’s speech synthesis system for Blizzard Challenge 2009. USTC attended all English tasks including the hub tasks and the spoke tasks. According to the various conditions for different tasks, different versions of HMM based unit-selection systems are constructed based on the USTC Blizzard Challenge 2008 system. Many new techniques are employed in our speech synthesis...
متن کاملThe HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge
For the 2008 Blizzard Challenge, we used the same speakeradaptive approach to HMM-based speech synthesis that was used in the HTS entry to the 2007 challenge, but an improved system was built in which the multi-accented English average voice model was trained on 41 hours of speech data with highorder mel-cepstral analysis using an efficient forward-backward algorithm for the HSMM. The listener ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015